搜索资源列表
WPCrawler-master
- Java+mysql实现的网络爬虫。针对单个WordPress网站的网络爬虫程序 使用的开源类库如下: Apache HttpComponents 4.3 HTML Parser 2.0 MySQL Connector/J 5.1.27 使用UTF-8编码以记录中文标签 使用XAMPP默认MySQL端口localhost:3306 需要本地XAMPP环境 -Java+ mysql web crawler.On a single web crawlers WordP
SearchEngine
- dySE 是个开源的 Java 小型搜索引擎。该搜索引擎分为三个模块:爬虫模块、预处理模块和搜索模块。其中详细阐述了: 多线程页面爬取、正文内容提取、文本提取、分词、索引建立、快照等功能的实现。-dySE is an open source Java small search engines. The search engine is divided into three modules: crawler module, pretreatment module and search module
webmagic-master
- 一个爬虫框架,除了不会反爬虫外(当然可以自己加)其他都很牛逼,用java写的。-A crawler frame, besides will not reverse the crawler themselves are added (of course) other are very cow force, written in Java.
1-120P1142U8
- java实现的爬虫程序。可以下载web上的资源-crawler implement by java
java_crawler(cookie)-
- 使用java编写的抓包程序,对于一般的抓包比较简单,这里主要是对需要cookie验证的网页进行抓包,代码比较简单,自行下载理解。-java crawler cookie
MISS
- 简单servet java程序写的网络爬虫-Simple servlet java program writing web crawler
bb-WeiBo-master
- 微博爬虫java版支持数据库操作 微博爬虫java版支持数据库操作-microblog crawler asist
Spider
- JAVA写的网络爬虫小程序,利用正则表达式提取关键信息。-JAVA applet written web crawler using regular expressions to extract key information.
NTP
- 通过java实现一个网络爬虫,搜索互联网主机,分析NTP协议的层次结构。-Java achieve through a web crawler, search the Internet host, analysis hierarchy of NTP.
ZhihuDown
- java写的网络爬虫,可以爬取知乎网站等等网站的文字信息,简单易懂,可以很方便的修改爬取其他网站的关键字段。-java to write the Web crawler can crawl text messages almost known sites, and more websites, easy to understand, you can easily modify key fields crawling other sites.
webmagic
- 开源的Java垂直爬虫框架,目标是简化爬虫的开发流程,让开发者专注于逻辑功能的开发。webmagic的核心非常简单,但是覆盖爬虫的整个流程,也是很好的学习爬虫开发的材料。作者曾经在前公司进行过一年的垂直爬虫的开发,webmagic就是为了解决爬虫开发的一些重复劳动而产生的框架。-Open source Java vertical crawler framework, the goal is to simplify the development process of reptiles, allo
threadTest
- 用Java写的简易爬虫,可以抓取用户自定义页面中链接的对应页面。抓取到的文件可以存放在用户自定义的目录下。-Use Java to write a simple crawler can crawl custom page link to the corresponding page. Crawl to the file can be stored in the user-defined directory.
WeiboSpider-master
- 基于java语言的微博爬虫程序 Based microblogging java crawler language-Based microblogging java crawler language
Spider
- Java 网络蜘蛛爬虫spider源码能自动漫游与Web站点,在Web上按某种策略自动进行远程数据的检索和获取-Java spider web crawler spider source code can automatically roam with the Web site, according to a certain strategy in Web remote data retri and access
CatchNews
- 通过正则表达式分析网页内容,java编写的页面抓取程序-Regular expression analyzes web content, java written pages crawler
JarsCrawler
- java爬虫工具,多线程爬虫工具,可以更改可其它的主题爬虫,这里面主要是爬取jar-Java crawler tools, multi-threaded crawler tools, you can change the other subject reptiles, which is mainly crawling jar
CquNews
- 这是一个基于lucene的新闻搜索引擎,使用Java编写的网络爬虫抓取数据-This is based on a news lucene search engine, written in Java Web crawler to crawl data
Url
- Java 爬虫,实现按照关键词爬取图片,可设置爬取数目,并下载在指定目录下,可设置分辨率大小-Java crawler, according to keyword crawling pictures, you can set the number of climbing, and download in the specified directory, you can set the resolution size
sinaweibo
- 这是用java语言网络爬虫例子,具有很好地参考意义。(Web crawler example, has a good reference value.)
weibo3.2
- WebCollector是一个无须配置、便于二次开发的JAVA爬虫框架(内核),它提供精简的的API,只需少量代码即可实现一个功能强大的爬虫。WebCollector-Hadoop是WebCollector的Hadoop版本,支持分布式爬取。(WebCollector is a JAVA crawler framework (kernel) that does not need to be configured and easy to develop for two times. It prov